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[57] ABSTRACT 

A computer method of online mining of quantitative asso- 
ciation rules consisting of two stages, a preprocessing stage 
followed by an online rule generation stage. The required 
computational effort is reduced by the pre-processing stage, 
defined by pre-processing data to organize the relationship 
between antecedent attributes to create a heirarchiaUy 
arranged multidimensional indexing structure. The resulting 
structure facilitates the performance of the second stage, 
online processing, which involves the generation of quan- 
titative association rules. The second stage, online rule 
generation, utilizes the multidimensional index structure 
created by the preprocessing stage by first finding the areas 
in the data which correspond to the rules and then uses a 
merging step to create a merged tree in order to careftiUy 
combine interesting regions in order to give a heirarchical 
representation of the rule set. The merged tree is then used 
in order to actually generate the rules. 
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(1) 

Age[0-100]=>FirstTimeBuyer=l 




Age[0-45]=>FirstTimeBuyer=l 





null Age[25-45]=>FirstTimeBuyer=l null null 
(3) (4) (6) (7) 

Note: Null nodes contain no rules. 

FlG.4(b) 



Age[0-100]=>FirstTimeBuyer=l 

conf.=50% 



Age[0-45]=>FirstTimeBuyer=l 

conf.=55% 



Age[25-45]=>FirstTimeBuyer=l 

conf.=80% 



FIG. 5(b) 
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ON-LINE MINING OF QUANTITATIVE 
ASSOCUTION RULES 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 5 
The present invention relates generally to online search- 
ing for data dependencies in large databases and more 
particularly to an online method of data mining of data items 

to find quantitative association rules, where the data items 
comprise various kinds of quantitative and categorical 
attributes. 

2. Discussion of the Prior Art 

Data mining, also known as knowledge discovery in 
databases, has been recognized as a new area for database 
research. The volume of data stored in electronic format has 
increased dramatically over the past two decades. The 
increase in use of electronic data gathering devices such as 
point-of-sale or remote sensing devices has contributed to 
this explosion of available data. Data storage is becoming 
easier and more attractive to the business community as the 
availabihty of large amounts of computing power and data 
storage resources are being made available at increasingly 
reduced costs. 

With much attention focused on the accumulation of data, 25 
there arose a complimentary need to focus 00 how this 
valuable resource could be utilized. Businesses soon recog- 
nized that valuable insights could be gleaned by decision- 
makers who could make use of the stored data. By using data 
from bar code companies, or sales data from catalog 30 
companies, it is possible to gain valuable information about 
customer buying behavior. The derived information might 
be used, for example, by retailers in deciding which items to 
shelve in a supermarket, or for designing a well targeted 
marketing program, among others. Numerous meaningful 35 
insights can be imearthed from the data utilizing proper 
analysis techniques. In the most general sense, data mining 
is concerned with the analysis of data and the use of software 
techniques for finding patterns and regularities in sets of 
data. 'ITie objective of data mining is to source out discern- 40 
ible patterns and trends in data and infer association rules 
from these patterns. 

Data mining technologies are characterized by intensive 
computations on large volumes of data. Large databases are 
definable as consisting of a million records or more. In a 45 
typical application, end users will test association rules such 
as; "75% of customers who buy Cola also buy corn chips", 
where 75% refers to the rule's confidence factor. The 
support of the rule is the percentage of transactions that 
contain both Cola and com chips. 50 

To date the prior art has not addressed the issue of online 
mining but has instead focused on an itemset approach. 
IBM*s Almaden*s project called Quest is based upon this 
method. A significant drawback of the itemset approach is 
that as the user tests the database for association rules at 55 
differing values of support and confidence, multiple passes 
have to be made over the database, which could be of the 
order of Gigabytes. For very large databases, this may 
involve a considerable amount of I/O and in some situations, 
it may lead to unacceptable response times for online 60 
queries. A user must make multiple queries on a database 
because it is difiScult to guess apriori, how many rules might 
satisfy a given level of support and confidence. Typically 
one may be interested in only a few rules. This makes the 
problem all the more difficult, since a user may need to run 65 
the query multiple times in order to find appropriate levels 
of minimum support and minimum confidence in order to 
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mine the rules. In other words, the problem of mining 
association rules may require considerable manual param- 
eter tuning by repeated queries, before useful business 
information can be gleaned from the transaction database. 
The processing methods of mining described heretofore are 
therefore unsuitable to repeated online queries as a result of 
the extensive disk I/O or computation leading to unaccept- 
able response times. The need for expanding the capabilities 
of data mining to the internet requires dynamic online 
methods rather than the batch oriented method of the itemset 
approach. It is therefore a primary object of the invention to 
provide a computationally eflicient method for making 
online queries on a database to evaluate the strength of 
association rules utilizing user supplied levels of support and 
confidence as predictors. 

It is a further object object of the invention to discover 
quantitative association rules. 

SUMMARY OF THE INVENTION 

The present invention is directed to a method for effi- 
ciently performing online mining of quantitative association 
mles. An association rule can be generally defined as a 
conditional statement that suggests that there exists some 
correlation between its two component parts, antecedent and 
consequent. In a quantitative association rule both the ante- 
cedent and consequent are composed from some user speci- 
fied combination of quantitative and categorical attributes. 
Along with the proposed rule, the user would provide three 
additional inputs representing the confidence and support 
level of interest to the user and a value referred to as interest 
level. These inputs provide an indication of the strength of 
the rule proposed by the user (the user query). In other words 
the strength of the suggested correlation between antecedent 
and consequent defined by the user query. 

In order to carry out the object of the present invention, 
there is disclosed, a method for preprocessing the raw data 
by utilizing the antecedent attributes to partition the data so 
as to create a mutidimensional indexing structure,followed 
by an online rule generation step. By effectively pre- 
processing the data into an indexing structure it is placed in 
a form suitable to answer repeated onhne queries with 
practically instantaneous response times. Once created, the 
indexing structure obviates the need to make multiple passes 
over the database. The indexing structure creates significant 
performance advantages over previous techniques. The 
indexing structure" (pre-processed data) is stored in such a 
way that online processing may be done by applying a graph 
theoretic search algorithm whose complexity is proportional 
to the size of the output. This results in an online algorithm 
which is practically instantaneous in terms of response time, 
minimizing excessive amounts of I/O or computation. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is an overall description of the computer network 
in which this invention operates. 

FIG, 2 is an overall description of the method performed 
by the invention. It consists of two stages described by 
FIGS. 2{a) and 2(b), 

FIG. 2(a) is a description of the preprocessing stage. 

FIG. 2(b) is a description of the on-line stage of the 
algorithm. 

FIG. 3 is a detailed description of how the index tree is 
constructed using the antecedent set. It can be considered an 
expansion of step 75 of FIG. 2(a). 

FIG. 4 is a detailed description of how the unmerged rule 
tree is generated from the index tree. It can be considered an 
expansion of step 100 of RG, 2(b). 
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FIG. 5 is a description of how the merged rule tree is built 
from the unmerged rule tree. 

FIG. 6 is a description of how the quantitative association 
rules are generated from the merged rule tree at some user 
specified interest level r. 5 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

The present invention is directed to a method for online 
data mining of quantitative association rules. Traditional 10 
database queries consisting of simple questions such as 
"what were the sales of orange juice in January 1995 for the 
I^ng Island area?". Data mining, by contrast, attempts to 
source out discernible patterns and trends in the data and 
infers rules firom these patterns. With these rules the user is 
then able to support, review and examine decisions in some 
related business or scientific area. Consider, for example, a 
supermarket with a large collection of items. Typical busi- 
ness decisions associated with the operation concern what to 
put on sale, how to design coupons, and how to place 20 
merchandise on shelves in order to maximize profit, etc. 
Analysis of past transaction data is a commonly used 
approach in order to improve the quality of such decisions. 
Modern technology has made it possible to store the so 
called basket data that stores items purchased on a per- 25 
transaction basis. Organizations collect massive amounts of 
such data. The problem becomes one of "mining" a large 
collection of basket data type transactions for association 
rules between sets of items with some minimum specified 
confidence. Given a set of transactions, where each trans- 
action is a set of items, an association rule is an expression 
f of the form X=>Y, where X and Y are sets of items. 

An example of an association rule is: 30% of transactions 
that contain beer also contain diapers; 2% of all transactions 
contain both of these items". Here 30% is called the confi- 
dence of the rule, and 2% the support of the rule. 

Another example of such an association rule is the state- 
ment that 90% of customer transactions that purchase bread 
and butter also purchase milk. The antecedent of this rule, X, 
consists of bread and butter and the consequent, Y, consists 
of milk alone. Ninety percent is the confidence factor of the 
rule. It may be desirable, for instance to find all rules that 
have "bagels" in the antecedent which may help determine 
what products (the consequent) may be impacted if the store 
discontinues selling bagels. 

Given a set of raw transactions, D, the problem of mining 
association rules is to find all rules that have support and 
confidence greater than the user-specified minimum support 
(minsupport s) and minimum confidence (minconfidence c). 
Generally, the support of a rule X=>Y is the percentage of 
customer transactions, or tuples in a generalized database, 
which contain both X and Y itemsets. In more formal 
mathematical terminology, the rule X=>Y has support s in 
the transaction set D if s% of transactions in D contain X 
union Y, X V Y. The confidence of a mle X=>Y is defined 
as the percentage of transactions that contain X which also 
contain Y. Or more formally, the rule X=>Y has confidence 
c in the transaction set D if c% of transactions in D that 
contain X also contain Y Thus if a rule has 90% confidence 
then it means that 90% of the transactions containing X also 
contain Y. 

As previously stated, an association rule is an expression 
of the form X=>Y For example if the itemsets X and Y were 
defined to be ^5 

X=[milk & cheese & butter] 

Y=[eggs & ham] respectively 
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The rule may be interpreted as: 

RULE: X=>Y, implies that given the occurrence of milk, 
cheese and butter in a transaction, what is the likelihood 
of eggs and ham appearing in that same transaction to 
within some defined support and confidence level. 
The support and confidence of the rule collectively define 
the strength of the rule. There are a number of ways in which 
a user may pose a mle to such a system in order to test its 
strength. A non-inclusive yet representative list of the kinds 
of online queries that such a system can support include; 

(1) Find all association rules above a certain level of 
minsupport and minconfidence. 

(2) At a certain level of minsupport and minconfidence, 
find all association rules that have the set of items X in 
the antecedent. 

(3) At a certain level of minsupport and minconfidence, 
find all association rules that have the set of items Y in 
the consequent. 

(4) At a certain level of minsupport and minconfidence, 
find all association rules that have the set of items Y 
either in the antecedent or consequent or distributed 
between the antecedent and consequent. 

(5) Find the number of association rules/itemsets in any of 
the cases (1), (2), (3), (4) above. 

(6) At what level of minsupport do exactly k itemsets exist 
containing the set of items Z. 

The present method particularizes the method of discov- 
ering general association rules to finding quantitative rules 
from a large database consisting of a set of raw transactions, 
D, defined by various quantitative and categorical attributes. 

For example, a typical quantitative/categorical database 
for a general marketing survey would consist of a series of 
records where each record reflects some combination of 
consumer characteristics and preferences; 
Record (l)=age=21, sex=male, homeowner=no 
Record (2)=age=43, sex-male, homeowner=yes 
Record (3)«age=55, sex=female, homeowner,=no 

In general, a quantitative association rule is a condition of 
the form; 

GENERAL RULE: 

Xl[ll . . . ul], X2[12 . . . u2] , . , Xk[lk . , . uk] Yl=cl, 
Y2oc2 . . . Yr-cr=>Zl=zl, Z2=z2 
where XI, X2, . . . Xk correspond to quantitative antecedent 
attributes, and Yl, Y2, . . . Yr, and C correspond to 
categorical antecedent attributes. Here [11 , . . ul], [12 . . . 
u2], ...[Ik... uk] correspond to the ranges for the various 
quantitative attributes. Zl and Z2 correspond to a multiple 
consequent condition. 

The present method requires that a user supply three 
inputs, a proposed rule, otherwise referred to as the user 
query, in the form of an antecedent/consequent pair. In 
addition to the proposed rule a user would supply values for 
minimum required confidence (minconfidence=c), and mini- 
mum required support, (minsupport^), to test the strength 
of the proposed rule (user query). 

Both the minimum confidence and and minimum support 
are as relevant to the discovery of quantitative association 
rules as they are to the discovery of general association 
rules. An example of a typical user input might be; 
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EXAMPLE B 



Typical User Input 
1, User supplies a proposed Rule to be tested (query) 





CONSEQUENT 


A^^rECEDE^^^ coNornoN 


CONDITION 


Agc[20-40],Salaryt 1 00k-200k], Sex=Femalc 


=> Cars«2 



2. User supplies a confidence value for the proposed rule, 
referred to as minconfidence, c. Minconfidence=50% 

3. User supplies a support value for the proposed rule, 
Miosupport, s. 

Minsupport= 10% 

FIG. 1 is an overall description of the architecture of the 
present method. There are assumed to be muUiple clients 40 
which can access the preprocessed data over the network 35. 
The preprocessed data resides at the server 5. There may be 
a cache 25 at the server end, along with the preprocessed 25 
data 20. The preprocessing as well as the online processing 
takes place in the CPU 10. In addition, a disk 15 is present 
in the event that the data is stored on disk. 

The present method comprises two stages, a pre- 
processing stage followed by an online processing stage. 
FIG. 2(a) shows an overall description of the preprocessing 
step as well as the online processing (rule generation steps) 
for the algorithm. The pre-processing stage involves the 
construction of a binary index tree structure, see step 75 of 35 
FIG. 2 and the associated detailed description of FIG. 3(ij). 
The use of an index tree structure is a well known spatial 
data structure in the art which is used as a means to index on 
multidimensional data. Related work in prior art may be 
found in Guttman, A.,/4 dynamic Index Structure for Spatial 
Searching. Proceedings of (he ACM SIGMOD Conference. 
In the present method a variation on this index tree structure 
is employed in order to perform the on-line queries. Ante- 
cedent attributes are utihzed to partition the data so as to 45 
create a multidimensional indexing structure. The indexing 
structure is a two-level structure where the higher level 
nodes are associated with at most two successor nodes and 
lower level nodes may have more than two successor nodes. 
The construction of the indexing structure is crucial to 50 
performing effective online data mining. The key advantage 
resides in minimizing the amount of disk I/O required to 
respond to user queries. 

A graphical analogue of the indexing structure, stored in 
computer memory, is shown shown in FIG. 3(fc) in the form 
of an index tree. An index tree is a well known spatial data 
structure which is used in order to index on multi- 
dimensional data. A separate index structure will be created 
in computer memory for each dimension, defined by a 
particular quantitative attribute, specified by the user in the 
online query. FIG. 3(Z>) is a specific example of an index tree 
structure which represents the antecedent condition, "Age" 
and its associated consequent condition, "FirstTimeBuyer". 
To further clarify the concept of an index tree, FIG. 3(6) 65 
could have represented the "Age" dimension in the example 
below; 



Sample User Query 



ANTECEDENT CONDITION 


CONSEQUENT CONDITION 


Salaryl40k-S5k]Age[O-l OOlScx 


=> FirstTimcBuycr 



In general there are no restrictions with respect to the 
quantity or combination of quantitative and categorical 
attributes which comprise the antecedent and consequent 
conditions. 

In FIG. 3(6) the root node of the index tree structure 
defines the user specified quantitative attribute, Age[0-100]. 
Each of the successive nodes of the tree also represent the 
quantitative attribute. Age, with increasingly narrower range 
limits from the lop to the bottom of the tree heirarchy. For 
example, the binary successors to the root node for age 
[0-100] are Age[0-45] and Age[45-100]. The present 
method stores two pieces of data at each node of the index 
tree representing the confidence and support levels of inter- 
est. For example, with reference to FIG. 3(6), at the root 
node, two pieces of data are stored consisting of; 

1. confidence level=50% 

2, support level^fiinction of data input to the raw database 
defining the confidence and support for the user query, 
(antecedent/consequent pair), 

age[0-100]=>FirstTimeBuyer 
at the root node. 

FIG. 3(a) is the detailed flowchart of the preprocessing 
stage of the algorithm, illustrated in FIG, 2 as element 100. 
'ilie process steps of this stage involve generating the binary 
index tree structure and storing the support and confidence 
levels for the consequent attribute at each node of the 
structure, followed by utilizing a compression algorithm on 
the lower levels of the structure to ensure that the index tree 
fits into the available memory. Step 300 is the point of entry 
into the preprocessing stage. Step 310 represents the soft- 
ware to implement the process step of using a binarization 
algorithm to generate a binary index tree. The binarization 
step has been discussed in the prior art in Aggarwal C. C, 
Wolf J., Yu R S., and Epehnan M. A. The S-Tree: An efficient 
index tree for multidimensional index trees. Symposium of 
Spatial Databases, 1997. However, the present method 
diverges from the prior art in at least one aspect. At Step 315, 
the way in which the entries of an index node are organized 
is unique in that both the support level and the confidence 
level for each value of the consequent attribute are stored at 
each node in the structure. Step 320 represents the software 
to implement the process step of utilizing a compression 
algorithm to compress the lower level index nodes into a 
single node, 

FIG. 4(fl) is the detailed flowchart of the primary search 
algorithm which is used in order to generate the unmerged 
mlc tree from the index tree, illustrated in FIG. 2(6) as 
element 100. The algorithm requires as input, user specified 
values for minconfidence c, minsupport s, and a user query 
which consists of a Querybox Q and one or more right hand 
side values, Zl«zl, Z2oz2. The Querybox is merely a 
descriptive term to denote the lefthand or antecedent portion 
of the user query. To further clarify the meaning of 
Querybox, Example C below describes what is required of 
an online user as input in the present method; 
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EXAMPLE C setting a pointer, Currentnode to point to the root node of the 

index tree. Pointer CurrentNode will always point to the 
Typical User Input particular node of the index tree which the algorithm is 
The user would specify- presently searching. Step 420 defines LIST as a set of nodes 
,r- ^ . -. 5 which are considered to be eligible nodes to be scanned by 
(L) a minimum confidence value, [mmconfidence, c] ^^^^ algorithm. UST is initialized to contain only the 
(2.) a minimum support value, [minsupport, s] root node in step 420. Step 430 represents the software to 
An online user would, in addition be required to input a implement the process step of adding all the child nodes of 
user query (proposed rule) in the form of an aa(antecedent/ the node pointed to by Currentnode to LIST which intersect 
consequent) pair, items 3&4. lo with Querybox Q, and have support at least equal to the user 
(3.) a Querybox, "Q" [the antecedent] supplied input value, minsupport, s. A child node is said to 
(4.) Zl»zl, Z2=z2, etc. [a consequent] intersect with Querybox Q, when all of the antecedent 
Item three, the Querybox, is further explained by the conditions associated with the child node are wholly con- 
following examples, and can generally consist of any com- ^^^^^^ within the antecedent condition defined by the Que- 
bination of quantitative and categorical attributes. Item four, ^5 rybox. Step 440 is a decision step which determines whether 
the consequent attribute, can consist of one or more cat- individual data records contained in CurrentNode satisfy 
egorical attributes. consequent condition, Zl«zl and Z2=z2 at least c 

percent of the time. If the condition of step 440 is satisfied 

EXAMPLE 1 then the algorithm proceeds to step 445. Step 445 generates 

^ . . . . , 20 the rule corresponding to the set of attributes on the right 

T^is user specified query consists of an antecedent ^i^e, the consequent condition. Step 450 foUows steps 

condition, querybox with two dimensions, Age and 445 represents the software to implement the 

Lefthandedness and a single categorical consequent p^^^^^ ^^^p deleting the node presently pointed to by 

condition, asmoker. Currentnode from LIST and setting the pointer Currentnode 

Querybox/Age[0-24], Lefthanded==>asmoker 25 to the next node contained in LIST. Step 460 determines 

whether LIST is empty and terminates the algorithm when 

EXAMPLE 2 condition is met, see Step 470. Otherwise, the algorithm 

This user specified query consists of an antecedent returns to step 430 and repeats the steps for the node 

condition, querybox, with two dimensions. Height and currently pointed to by the pointer CurrentNode. Upon 

Income and a multiple consequent condition. Querybox/ termination of the algorithm, an unmerged rule tree is output 

Height[5-7], Income[10k-40k]==>ownsahome, ownsacar which consists of all nodes in the input index tree which 

satisfy the user specified minimum support, minsupport s. 
EXAMPLE 3 FIG. S(a) is the detailed flowchart which describes the 
_ .„ J . ^ . . , process of constructing the merged rule tree from the 
The user specified query consists of a single antecedent 35 ^^merged rule tree. The algorithm described by the flow- 
condition, querybox, with a single dmiension, Age, and a ^^^^ compresses the unmerged rule tree to obtain a hierar- 
single consequent condition. representation of the rules. The unmerged rule tree is 
Querybox/Age[lCM3]=«>asmoker traversed in depth first search order where at each node a 
Example C above, describes in general terms what a user determination is made as to whether that node is meaningful, 
supplies as input to the method. Example D below provides 40 A meaningful node is defined to be a node which has a rule 
a representative example. Using the user query in example associated with it. A rule may or may not have been 
2 above, a typical input/output result could look as follows: associated with a node when the unmerged rule tree was 

created. To further clarify the distinction between meaning- 

LXAMPLL D nonmeaningful nodes, refer back to FIG. 4(b), the 

User specifies as input: 45 unmerged rule tree, where meaningful nodes correspond to 

L minconfidence«0.50 nodes 1, 2, and 4. All meaningful nodes are preserved in the 

2 minsuDoort=0 4 merged rule tree. If a node is determined not to be mean- 

^ , ' , . V „ . , , ingful then the algorithm either eliminates that node, or 

n ntZ^nJ^l^'"^^"^ condiUon)-Height[5-6], Income ^.^g,, j^^jtiple child nodes into a single node when certain 

L10k-40kJ conditions are met. 

4. consequent condition of interest=ownsahome-l. Step 500 represents the point of entry into the algorithm. 

ownsacar«l Step 510 represents the software to implement the process 
user query formed from items (3&4): step of insuring that the unmerged rule tree is traversed in 
Heigh t[5-7], Income[10k-40k]=«>ownsahome, depth first search order. Step 515 represents the step of 
ownsacar 55 proceeding to the next node in the unmerged rule tree in the 
Resulting output: generated rule height[5.5-6.2]. Income depth first traversal. Step 520 represents a decision step 
[13k-27.4k]==>ownsahome=l, ownsacar*! which determines whether the current rule node is a mean- 
In general, the output can conceivably generate no rules, ingful node. A branch is made to step 530 when the current 
onerulc, or multiple rules. A single rule was generated in the node is determined to be meaningful. Otherwise the algo- 
example above. The generated rule is said to satisfy the user 60 rithm branches to step 540 thereby classifying the node as 
query, (antecedent/consequent pair), at the user specified nonmeaningful. Step 540 is a decision step which deter- 
confidence and support level, 0.5 and 0.4 respectively. mines whether the nonmeaningful node has a child node. If 
The algorithm for generating the unmerged rule tree from the nonmeaningful node does have a child node a branch is 
the index tree, defined by FIG. 4(a), proceeds by searching taken to step 550. Step 550 represents the software to 
all the nodes in the index tree one by one. Step 400 is the 65 implement the process step of deleting the current nonmean- 
point of entry into the primary search algorithm. Step 410 ingful node. Otherwise, if it is determined in step 540 that 
represents the software to implement the process step of the current node does not have a child node, a branch will 
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be taken to step 560. Step 560 is a decision step for the 
purpose of determining whether the current nonmeaningful 
node has one or more than one child nodes. If the current 
node has only a single child node then a branch is taken to 
step 570. Step 570 represents the software to implement the 5 
process step of deleting the current node and directly con- 
necting the parent and child nodes of the deleted nonmean- 
ingful node together in the index tree. Otherwise, in the case 
where the current node is found to have multiple child nodes 
a branch is taken to step 580. Step 580 is a decision step 
which determines whether the minimum bounding rectangle 
of the two child nodes are more than that of the nonmean- 
ingful parent node. ITie minimum bounding rectangle is 
defined by the upper and lower bounds (the range) of the 
quantitative attribute for each child node. When the ranges 
of the child nodes are combined and found to be broader 
than the range of the parent node, a merger occurs. For 
example, if the child nodes were defined as; 

chUd node 1— age [10-20] 

child node 2— age [30-40] 

and the corresponding parent node were defined as; 

parent node — age [10-30] 
then a merger would occur in this example, since the 
combination of the child attribute ranges yields a combined 
range of [10-40] which is broader than than range specified 
by the parent node, [10-30]. 25 

If the confidence of the minimum bounding rectangle of 
the two child nodes exceeds that of the parent node, a branch 
will occur to step 590. Step 590 represents the software to 
perform the process step of adjusting the minimum bound- 
ing rectangle of the parent to be the minimum bounding 30 
rectangle of the two child nodes. A branch to decision step 
600 determines whether there are any more nodes to traverse 
in the tree. A branch to termination step 610 occurs if there 
are no more nodes to traverse, otherwise process steps 
490-515 are repeated for the remaining index nodes. 35 

FIG. 6 is the detailed flowchart which describes the 
process of using the merged rule tree as input to define the 
rules at the user specified interest level r. The merged rule 
tree is traversed in depth first order. Step 616 is the point of 
entry into the flowchart. A user would specify an input value 40 
for r, representing the interest level. Step 618 represents the 
software to select the next node in the merged rule tree in 
depth first order. Step 620 is a decision step which represents 
the software which looks at all ancestral nodes of the current 
node of interest to determine whether any of them has a 45 
confidence value at least equal to 1/r of the current node. A 
branch to Step 630 will be taken when condition is true. Step 
630 represents the software to prune the rule associated with 
the current node. If the condition is not met, a branch to Step 
640 is taken. Step 640 is a decision step which determines 50 
whether there are any remaining nodes to be evaluated in the 
merged rule tree. The process steps will be repeated if there 
are additional nodes to be evaluated, otherwise the process 
terminates at this point. 

While the invention has been particularly shown and 55 
described with respect to illustrative and preformed embodi- 
ments thereof, it will be understood by those skilled in the 
art that the foregoing and other changes in form and details 
may be made therein without departing firom the spirit and 
scope of the invention which should be limited only by the 60 
scope of the appended claims. 

Having thus described our invention, what we claim as 
new, and desire to secure by Letters Patent is: 

1. A computer program device readable by a machine, 
tangibly embodying a program of instructions executable by 65 
the machine to perform method steps for generating quan- 
titative association rules, the method steps comprising: 
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a) receiving a query including antecedent and consequent 
attributes where said antecedent and consequent 
attributes further comprise a plurality of quantitative 
and categorical attributes; 

b) organizing a relationship between said antecedent and 
consequent attributes; 

c) prestoring data defining the relationship between said 
antecedent attributes and data related to said conse- 
quent attributes; and 

d) deriving one or more quantitative association rules 
from prestored data in response to said prestoring step. 

2. The computer program device of claim 1, wherein step 
b further comprises partitioning said antecedent data hier- 
archicaUy into an index tree where said index tree comprises 
a multiplicity of index nodes. 

3. The method of claim 2, wherein the step of partitioning 
said antecedent data hierarchically into the index tree further 
comprises: 

a) storing a first value at each index node of said index tree 
representing the actual support; and 

b) storing a second value at each index node of said index 
tree representing the firequency of occurrence for each 
user query consequent attribute. 

4. The computer program device of claim 1, said answer 
fiirther comprises one or more quantitative association rules, 
an actual confidence value associated with each rule, an 
actual support value associated with each rule, and an 
interest level associated with each rule. 

5. The computer program device of claim 4, wherein said 
quantitative association rules consist of only those rules 
which are interesting, where a interesting rules include those 
rules whose computed interest level is at least equal to said 
user defined interest level. 

6. The computer program device of claim 5, wherein said 
interest level is defined as the minimum of a first and a 
second computed ratio, wherein said first ratio is defined as 
the actual confidence divided by an expected confidence and 
a second ratio is defined as the actual support divided by an 
expected support, wherein said expected confidence and 
support are computed values based on a presumption of 
statistical independence. 

7. The computer program device of claim 1, wherein said 
antecedent attributes are further comprised of categorical 
and quantitative attributes. 

8. llie computer program device of claim 7, wherein said 
quantitative attributes are further defined by a range con- 
sisting of a lower and upper bound. 

9. The method of claim 1, wherein step d further com- 
prises the steps of: 

i) searching all index nodes of said index tree to isolate 
those nodes whose antecedent attribute range corre- 
sponds to said user query antecedent attribute range; 

ii) selecting from those nodes which satisfy the criteria of 
step i, whose consequent attribute is at least equal to 
said user defined value of minimum confidence; and 

iii) building a merge tree from those nodes which satisfy 
the criteria of steps i and ii. 

10. The computer program device of claim 9, wherein step 
iii further comprises deleting meaningless nodes and com- 
bining other nodes to create said merge tree. 

11. The computer program device of claim 10, wherein a 
meaningless node is a node which does not have a corre- 
sponding calculated value of confidence at least equal to said 
user defined value of minimum confidence. 

12. ITie computer program device of claim 10, wherein 
the merge tree may be built either for a single or multiple 
consequent attributes. 
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13. The computer program device according to claim 1 
wherein the step of constructing a merged rule tree com- 
prises 

a) traversing each node of the unmerged rule tree in post 
order; 

b) evaluating each traversed node for inclusion or exclu- 
sion in the unmerged rule tree, further comprising the 
steps of: 

i) determining whether each said user defined conse- 
quent attribute value is greater than the consequent 
attribute value stored at said node; 

ii) preserving said node in said merged rule tree when 
the condition of step i is satisfied; 

iii) deleting said node from said merged rule tree when 
the condition of step i is not satisfied and said node 
has no associated child nodes; 

iv) deleting said node from said merged rule tree when 
the condition of step I is not satisfied and said node 
has one child node 

v) adjusting the range of said consequent attribute when 
the condition of step i is not satisfied; 

vi) directly associating an ancestor node and child node 
of said deleted node when the condition of step iv is 
satisfied; and 

vii) repeating steps i-vi until all nodes have been 
traversed in post order. 

14. A computer program device readable by a machine, 
tangibly embodying a program of instructions executable by 
the machine to perform method steps for generating quan- 
titative association rules, the method steps comprising: 

a) receiving data including a user defined value of mini- 
mum support, a user defined value of minimum 
confidence, and a user query comprising an antecedent 
and consequent condition where said antecedent and 
consequent condition further comprise a plurality of 
quantitative and categorical attributes; 

b) constructing in memory an index tree comprised of one 
or more dimensions, where each dimension is defined 
by one of the quantitative attributes, said index tree 
including a plurality of index nodes where said index 
nodes further include a plurality of data records; 

c) constructing in memory an unmerged rule tree from 
said index tree; 

d) constructing in memory a merged rule tree from said 
unmerged rule tree; 

e) generating one or more quantitative association rules 
from those index nodes that satisfy said user query and 
whose support is at least equal to said minimum 
support, and whose confidence is at least equal to said 
minimum confidence; and 

f) displaying to a user output data including: 
said quantitative association rules from the generating 

step; 

a value of actual confidence associated with each 

generated quantitative association rule; 
a value of support associated with each generated 

quantitative association rule; and 
a value of interest level associated with each generated 

quantitative association rule. 

15. The computer program device according to claim 14 
wherein the step of generating quantitative association rules 
is repeated so that said user query is interactively modified 
to fiirther define said association rules. 

16. The computer program device of claim 14 wherein the 
step of constructing an index tree comprises the steps of: 

1) constmcting a binary index tree of one or more 
dimensions, where each dimension is defined by one of 65 
said user supplied quantitative antecedent attributes; 
and 
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2) storing at each index node said support level and 
confidence level. 

17. The computer program device of claim 14 wherein the 
step of constructing an unmerged rule tree comprises the 
steps of: 

i) searching each node of said index tree; and 

ii) selecting those nodes which contain rules which satisfy 
the user specified consequent condition and have con- 
fidence at least equal to said user defined value of 
minimum confidence, and a value of support at least 
equal to said user defined value of minimum support. 

18. The computer program device according to claim 17, 
wherein step ii further comprises: 

i) constructing a pointer; 

ii) equating said pointer to a root node in said index tree; 

iii) adding said node associated with said pointer to a fist; 

iv) adding all children of the node pointed to by said 
pointer with antecedent attribute wholly contained 
within the parameters of said user specified antecedent 
attribute and have a minimum support value at least 
equal to said user defined minimum support; 

v) determining whether the data records stored at the node 
pointed to by said pointer at least equal to the user 
specified consequent condition and have a confidence 
at least equal to said user defined minimum confidence 
for the node pointed by said pointer; 

vi) generating a quantitative association rule associated 
with said consequent conditions; 

vii) deleting said node from said list when the conditions 
of the previous step are not satisified; 

viii) determining whether said list is empty; 

ix) terminating when said list is empty; 

x) when the condition of step ix is not satisified, equating 
said pointer to the next node of said index tree; and 

xi) repeating steps iii-x when the condition of step ix is 
not satisfied. 

19. A method of online mining of a large database having 
a plurality of records, and each record having a plurality of 
quantitative and categorical items for providing quantitative 
association rules comprising the steps of: 

a) receiving a user query comprising antecedent and 
consequent attributes; 

b) organizing the relationship between said antecedent 
and consequent attributes; 

c) prestoring data defining the relationship between said 
antecedent attributes and data related to said conse- 
quent attributes; and 

d) deriving one or more quantitative association rules 
from prestored data in response to said user query. 

20. The method of claim 19, wherein step b) further 
comprises partitioning said antecedent data hierarchically 
into an index tree where said index tree comprises a multi- 
pUcity of index nodes. 

21. The method of claim 20, wherein the step of parti- 
tioning said antecedent data hierarchically into the index tree 
further comprises: 

a) storing a first value at each index node of said index tree 
representing the actual support; and 

b) storing a second value at each index node of said index 
tree representing the frequency of occurrence for each 
user query consequent attribute. 

22. The method of claim 19 wherein step a) further 
includes receiving one or more of a user defined value of 
minimum confidence, a user defined value of minimum 
support, and a user defined value of interest level. 
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23. The method of claim 22 wherein step d) includes 
deriving one or more quantitative association rules that 
satisfy said user query, including said one or more of said 
minimum support, said minimum confidence and said inter- 
est level, 5 

24. The method of claim 23, further including displaying 
to a user one or more quantitative association rules, an actual 
confidence value associated with each rule, an actual support 
value associated with each rule, and an interest level asso- 
ciated with each rule. lO 

25. The method of claim 24, wherein said quantitative 
association rules include only those rules which are 
interesting, where said interesting rules include those rules 
whose computed interest level is at least equal to said user 
defined interest level. 15 

26. The method of claim 25, wherein said interest level is 
defined as the minimum of a first and a second computed 
ratio, wherein said first ratio is defined as the actual confi- 
dence divided by an expected confidence and a second ratio 

is defined as the actual support divided by an expected 20 
support, wherein said expected confidence and support are 
computed values based on a presumption of statistical 
independence. 

27. The method of claim 19, wherein said antecedent 
attributes are further comprised of categorical and quanti- 25 
tative attributes. 

28. The method of claim 27, wherein said quantitative 
attributes are further defined by a range consisting of a lower 
and upper bound. 

29. The method of claim 19, wherein step d) further 30 
comprises the steps of: 

i) searching all index nodes of said index tree to isolate 
those nodes whose antecedent attribute range corre- 
sponds to said user query antecedent attribute range; 

ii) selecting from those nodes which satisfy the criteria of 
step i, whose consequent attribute is at least equal to 
said user defined value of minimum confidence; and 

iii) building the merge tree from those nodes which satisfy 
the criteria of steps i and ii. 

30. The method of claim 29, wherein step iii further 
comprises deleting meaningless nodes and combining other 
nodes to create said merge tree. 

31. The method of claim 30, wherein a meaningless node 
is a node which does not have a corresponding calculated 
value of confidence at least equal to said user defined value 
of minimum confidence. 

32. The method of claim 30, wherein the merge tree may 
be built either for a single or multiple consequent attributes. 

33. The method according to claim 19 wherein the step of 
constructing a merged rule tree comprises 

a) traversing each node of the unmerged rule tree in post 
order; 

b) evaluating each traversed node for inclusion or exclu- 
sion in the unmerged mle tree, further comprising the 55 
steps of: 

i) determining whether each said user defined conse- 
quent attribute value is greater than the consequent 
attribute value stored at said node; 

ii) preserving said node in said merged rule tree when 60 
the condition of step i is satisfied; 

iii) deleting said node from said merged rule tree when 
the condition of step i is not satisfied and said node 
has no associated child nodes; 

iv) deleting said node from said merged rule tree when 65 
the condition of step 1 is not satisfied and said node 
has one child node 
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v) adjusting the range of said consequent attribute when 
the condition of step i is not satisfied; 

vi) directly associating an ancestor node and child node 
of said deleted node when the condition of step iv is 
satisfied; and 

vii) repeating steps i-vi until all nodes have been 
traversed in post order. 

34. A computer process of online mining for a large 
database having a plurality of records, each record having a 
plurality of quantitative and categorical items for providing 
quantitative association rules comprising the steps of: 

a) receiving data including a user defined value of mini- 
mum support, a user defined value of minimum 
confidence, a user defined value of interest, and a user 
query comprising an antecedent and consequent con- 
dition where said antecedent and consequent condition 
further comprise a plurality of quantitative and cat- 
egorical attributes; 

b) constmcting in memory an index tree comprised of one 
or more dimensions, where each dimension is defined 
by one of the quantitative attributes, said index tree 
including a plurality of index nodes where said index 
nodes further include a plurality of data records; 

c) constructing in memory an unmerged rule tree from 
said index tree; 

d) constructing in memory a merged rule tree from said 
unmerged rule tree; 

e) generating one or more quantitative association rules 
from those index nodes that satisfy said user query and 
whose support is at least equal to said minimum 
support, and whose confidence is at least equal to said 
minimum confidence; and 

f) displaying to a user output data including: 

said quantitative association rules from the generating 
step; 

a value of actual confidence associated with each 
generated quantitative association rule; 

a value of support associated with each generated 
quantitative association rule; and 

a value of interest level associated with each generated 
quantitative association rule. 

35. The method according to claim 34 wherein the step of 
generating quantitative association rules is repeated so that 
said user query is interactively modified to further define 
said association rules. 

36. The method according to claim 34 wherein the step of 
constmcting an index tree comprises the steps of: 

1) constructing a binary index tree of one or dimensions, 
where each dimension is defined by one of said user 
supplied quantitative antecedent attributes; 

2) storing at each index node said support level and 
confidence level. 

37. The method according to claim 34 wherein the step of 
constmcting an unmerged rule tree comprises the steps of: 

i) searching each node of said index tree; 

ii) selecting those nodes which contain rules which satisfy 
the user specified consequent condition and have con- 
fidence at least equal to said user defined value of 
minimum confidence, and a value of support at least 
equal to said user defined value of minimum support. 

38. The method according to claim 37, wherein step ii 
further comprises: 

i) constructing a pointer; 

ii) equating said pointer to a root node in said index tree; 

iii) adding said node associated with said pointer to a list; 
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iv) adding all children of the node pointed to by said 
pointer with antecedent attribute wholly contained 
within the parameters of said user specified antecedent 
attribute and have a minimum support value at least 
equal to said user defined minimum support; 5 

v) determining whether the data records stored at the node 
pointed to by said pointer at least equal to the user 
specified consequent condition and have a confidence 
at least equal to said user defined minimum confidence 
for the node pointed by said pointer; 1° 

vi) generating a quantitative association rule associated 
with said consequent conditions; 
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vii) deleting said node from said list when the conditions 
of the previous step are not satisfied; 

viii) determining whether said list is empty; 

ix) terminating when said list is empty; 

x) when the condition of step ix is not satisfied, equating 
said pointer to the next node of said index tree; and 

xi) repeating steps iii-x when the condition of step ix is 
not satisfied. 

♦ ♦ * * * 
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